perm filename PATREC[4,KMC]9 blob sn#101956 filedate 1974-05-13 generic text, type T, neo UTF8
00100	
00200	
00300	      PATTERN-MATCHING RULES FOR THE RECOGNITION OF
00400	         NATURAL LANGUAGE DIALOGUE EXPRESSIONS
00500	
00600			Kenneth Mark Colby
00700	                      and
00800		        Roger C. Parkison
00900	
01000	
01100		            INTRODUCTION
01200	
01300		To recognize something is to identify it as  an  instance  of
01400	the "same again".   This familiarity is possible because of recurrent
01500	characteristics of the world which repeat themselves  over  and  over
01600	again.      We shall describe an algorithm which recognizes recurrent
01700	characteristics of natural language dialogue expressions. It utilizes
01800	a  multi-stage  sequence  of pattern-matching rules for progressively
01900	transforming these input expressions into a pattern which  eventually
02000	best  matches  a more abstract stored pattern. The name of the stored
02100	pattern has a pointer to the name of a response  function  in  memory
02200	which decides what to do once the input has been recognized.     Here
02300	we shall discuss only  the  recognizing  functions,  except  for  one
02400	response  function  (anaphoric substitution) which interactively aids
02500	the recognition process.   Details  of  how  the  response  functions
02600	operate  will  be  described  in  a  future  communication by William
02700	Faught.
02800		In   constructing   and  testing  a  simulation  of  paranoid
02900	processes, we were faced with the  problem  of  reproducing  paranoid
03000	linguistic  behavior in a teletyped diagnostic psychiatric interview.
03100	The diagnosis of paranoid states,  reactions  or  modes  is  made  by
03200	clinicians  who  judge  a  degree of correspondence between what they
03300	observe linguistically in an interview and their conceptual model  of
03400	paranoid  behavior.  There  exists  a  high degree of agreement among
03500	psychiatrists about this conceptual model which relies mainly on what
03600	an interviewee says and how he says it.
03700		Natural language is a life-expressing  code  people  use  for
03800	communication  with  themselves  and others.  In a real-life dialogue
03900	such as a psychiatric interview,  the  participants  have  interests,
04000	intentions,  and  expectations which are revealed in their linguistic
04100	expressions.    To produce effects on an interviewer which  he  would
04200	judge  similar  to  the  effects  produced by a paranoid patient , an
04300	interactive  simulation  of  a  paranoid  patient  must  be  able  to
04400	demonstrate  typical  paranoid  interview  behavior.  To  achieve the
04500	desired effects, the paranoid model must have  the  ability  to  deal
04600	with the teletyped linguistic behavior of the interviewer.
04700		There  are  a  number of approaches one might consider for an
04800	ideal handling of dialogue expressions.       One approach  would  be
04900	to  construct  a  dictionary  of all expressions which could possibly
05000	arise in an interview.  Associated with each expression would be  its
05100	interpretation,  depending  on  dialogue  context, which in turn must
05200	somehow be defined. For obvious reasons, no one takes  this  approach
05300	seriously.       Instead  of  an  expression  dictionary,  one  might
05400	construct a word dictionary and then use projection rules to yield an
05500	interpretation  of a sentence from the dictionary definitions.   This
05600	has been the approach adopted by most  linguistic  parsers  based  on
05700	transformational or systemic  grammars.     Such  a  method  performs
05800	adequately  as  long  as  the  dictionary involves only a few hundred
05900	words, each word having only one or two senses, and the  dialogue  is
06000	limited  to  a mini-world of a few objects and relations.     But the
06100	problems which arise in a real-time psychiatric  interview  conducted
06200	in unrestricted English are too great for this method to be useful in
06300	actual dialogues requiring immediate responses.
06400		There  is little consensus knowledge about how humans process
06500	natural language.    They can be shown to possess some  knowledge  of
06600	grammar  rules  but  this  does not entail that they use a grammar in
06700	interpreting and producing language. It seems implausible to us  that
06800	people   possess   full   transformational  grammars  for  processing
06900	language.     Language  is  what  is  recognized  but  the  processes
07000	involved   may  not  be  linguistic  or  grammatical.      Originally
07100	transformational grammars were not designed to "understand"  a  large
07200	subset  of  English;  they  represented  a formal method for deciding
07300	whether a string is grammatical.
07400		An analysis of what one's problem actually  is  should  guide
07500	the  selection  or invention of methods appropriate to that problem's
07600	solution.  Our problem was not to develop a  consistent  and  general
07700	theory  of  language  nor  to  assert empirically testable hypotheses
07800	about how people process language.   Our problem  was  to  design  an
07900	algorithm  which recognizes what is being said in a dialogue and what
08000	is being said about it in order to make a response such that a sample
08100	of I-O pairs from the paranoid model is judged similar to a sample of
08200	I-O  pairs  from  paranoid  patients.  The  design  task  was one for
08300	artificial intelligence which attempts to get  computers  to  perform
08400	human  intellectual  functions.  New methods had to be devised for an
08500	algorithm   to   participate   in   a    human    dialogue    in    a
08600	paranoid-patient-like  way.  We  are  not  claiming  that our methods
08700	represent the way people process language.       We sought  effective
08800	methods  which  could  operate  efficiently  in real time.  Since our
08900	methods provide a general way of  many-to-one  mapping  from  surface
09000	expressions  to a single stored pattern, these methods can be used by
09100	any type of "host" system which takes natural language as input.
09200		To perform  the  task  of  managing  communicative  uses  and
09300	effects  of natural language, we developed a strategy of transforming
09400	the input until a pattern is achieved  which  matches  completely  or
09500	partially  a  more abstract stored pattern.  This strategy has proved
09600	adequate for our purposes a satisfactory percentage of the time.  (No
09700	one  expects  any  method to be successful 100% of the time since not
09800	even humans, the best natural  language  processors  around,  achieve
09900	this  level  of  performance).   The power of this method for natural
10000	language dialogues lies in its ability to ignore as  irrelevant  some
10100	of what it recognizes and everything it does not recognize at all.  A
10200	conventional  parser  doing  word-by-word,  parts-of-speech  analysis
10300	fails  when  it  cannot  find  one  or more of the input words in its
10400	dictionary. Because it must know every word, it is  too  fragile  for
10500	unrestricted dialogues.
10600		In  early versions of the paranoid model, (PARRY1), many (but
10700	not all) of the pattern recognition mechanisms allowed  the  elements
10800	of  the  pattern  to  be  order independent. (Colby, Weber, and Hilf,
10900	1971).  For example, consider the following expressions:
11000		(1) WHERE DO YOU WORK?
11100		(2) WHAT SORT OF WORK DO YOU DO? 
11200		(3) WHAT IS YOUR OCCUPATION? 
11300		(4) WHAT DO YOU DO FOR A LIVING? 
11400		(5) WHERE ARE YOU EMPLOYED? 
11500	In PARRY1 a procedure would scan these  expressions  looking  for  an
11600	information-bearing  contentive  such as "work", "for a living", etc.
11700	If it found such a contentive along with a "you"  or  "your"  in  the
11800	expression,  regardless  of  word  order,  it  would  respond  to the
11900	expression as if it were a question about the nature of  one's  work.
12000	This   method   correctly   classifies   the   5   sentences   above.
12100	Unfortunately, it includes the 2 examples below in the same category:
12200		(6) DOES YOUR FATHER'S CAR WORK?
12300		(7) HOW DID THINGS WORK OUT FOR YOU?
12400	An  insensitivity  to word order has the advantage that lexical items
12500	representing  different  parts  of  speech  can  represent  the  same
12600	concept,e.g.  "work" as noun or as verb. But a price is paid for this
12700	resilience and elasticity.  We  found  from  experience  that,  since
12800	English  relies  heavily  on  word  order to convey the meaning of it
12900	messages,  the   average   penalty   of   misunderstanding   (to   be
13000	distinguished from ununderdstanding), was too great. Hence in PARRY2,
13100	as will be described shortly, the patterns require a  specified  word
13200	order.
13300		For   high-complexity   problems  it  is  necessary  to  have
13400	constraints.       Diagnostic psychiatric interviews (and  especially
13500	those  conducted  over  teletypes)  have several natural constraints.
13600	First, clinicians are trained to ask  certain  questions  in  certain
13700	ways.    This  limits  the  number  of patterns required to recognize
13800	utterances about each topic. Second,  only  a  few  hundred  standard
13900	topics are brought up by interviewers who are trained to use everyday
14000	expressions and especially those used by the  patient  himself.  When
14100	the  interview  is  conducted  over teletypes, expressions tend to be
14200	shortened since the interviewer tries  to  increase  the  information
14300	transmission  rate  over  the slow channel of a teletype.    Finally,
14400	teletyped interviews represent  written  utterances.  Utterances  are
14500	known  to  be  highly redundant and unrecognized words can be ignored
14600	without losing the meaning of the message. Utterances are loaded with
14700	idioms,  cliches, pat phrases, etc.       - all being easy prey for a
14800	pattern-matching approach.   It is time-wasting and usually futile to
14900	try  to  decode  an idiom by analyzing the meanings of its individual
15000	words.
15100		We  shall  now describe the pattern-matching functions of the
15200	algorithm in some detail. (See Fig. 1 for a diagram  of  the  overall
15300	flow of control).
15400	
15500	
15600			    OVERVIEW
15700	
15800		PARRY2 has  two  primary  modules.   The  first  attempts  to
15900	RECOGNIZE the input and the second RESPONDS.  This paper is primarily
16000	about the  RECOGNIZE  module.   It  functions  independently  of  the
16100	RESPOND  module  except  in the case of pronoun references, which the
16200	RESPOND module provides to the RECOGNIZER on request.
16300		The recognition module has 4 main steps:
16400		1) Identify the words in the question and convert them to
16500			internal synonyms.
16600		2) Break the input into segments at certain bracketing words.
16700		3) Match each segment (independently) to a stored pattern.
16800		4) Match the resulting list of recognized segments to a stored
16900			  compound pattern.
17000	Each  of  these  steps,  except  the  segmenting, throws away what it
17100	cannot identify.  When an utterance refers to a  topic  which  PARRY2
17200	has some ability to recognize, this increases the chance that it will
17300	ultimately be recognized. Occasionally, a  reference  to  an  unknown
17400	topic is mis-recognized as some familiar topic.
17500	
17600	 		    PREPROCESSING
17700	
17800		Each word in the input expression is first  looked  up  in  a
17900	dictionary  of(currently)  about  1900 entries which, for the sake of
18000	speed, is maintained in core during run-time. (The complete  language
18100	recognition  process  requires less than one second of real time on a
18200	time-shared DEC PDP-10). The dictionary, which was built  empirically
18300	from  thousands of teletyped interviews with previous versions of the
18400	model, consists of words, groups of words, and names of  word-classes
18500	they  can  be translated into. (See Fig.  2 for illustrative examples
18600	of dictionary entries).  The size of the dictionary is determined  by
18700	the patterns, i.e. a word is not included unless it appears in one or
18800	more patterns.  Entries  in  the  dictionary  reflect  PARRY2's  main
18900	interests.  If  a  word  in the input is not in the dictionary, it is
19000	dropped from the pattern being formed. Thus if the input were:
19200		WHAT IS YOUR CURRENT OCCUPATION?
19300	and  the  word  "current"  were not in the dictionary, the pattern at
19400	this stage becomes:
19500		( WHAT IS YOUR OCCUPATION )   
19600	The question-mark is thrown away as  redundant  since  questions  are
19700	recognized  by  word  order. (A statement followed by a question mark
19800	(YOU GAMBLE?) is responded to in  the  same  way  as  that  statement
19900	followed  by  a  period.) Synonymic translations of words are made so
20000	that the pattern becomes, for example:
20100		( WHAT  BE  YOU  JOB )   
20200		Some groups of words (i.e. idioms) are translated as a  group
20300	so that, for example, "for a living" becomes "for job". Certain other
20400	juxtaposed words are contracted into a single word,  e.g.  "place  of
20500	birth"  becomes  "birthplace".  This  is  done to deal with groups of
20600	words which are  represented  as  a  single  element  in  the  stored
20700	pattern,  thereby preventing segmentation from occurring at the wrong
20800	places, such as at a preposition inside an idiom or  phrase.  Besides
20900	these  contractions, certain expansions are made so that for example,
21000	"DON'T" becomes "DO NOT" and "I'D" becomes "I WOULD".
21100		Misspellings  can  be the bane of teletyped interviews for an
21200	algorithm.  Here  they  are  handled  in  two  ways.  First,   common
21300	misspellings  of  important  words  are simply put in the dictionary.
21400	Thus "yoy" is known to mean "you". The apostrophe  is  often  omitted
21500	from contractions so most contractions are recognized with or without
21600	it. These common misspellings were gathered from over 4000 interviews
21700	with earlier versions of the paranoid model.
21800		Second,  there  are  5 common forms of typing error which are
21900	checked systematically.  These are:
22000		1) Doubled letter
22100		2) Extraneous letter
22200		3) Forgetting to hold the "shift key" for an apostrophe
22300		4) Hitting a nearby key on the keyboard
22400		5) Transposing two letters in a word
22500	The first three errors can be corrected  by  deleting  the  offending
22600	character  from  the  word.   This  is  accomplished by deleting each
22700	character in turn until the  word  becomes  a  recognized  word.  The
22800	fourth  type  of error is only checked for 10 of the more common near
22900	misses.    These were also empirically determined and include  letter
23000	pairs  like  (T  Y), (O P), (A S), and (N M).   These methods are all
23100	based on typing errors, but they also correct some legitimate English
23200	spelling  errors.   Two-letter  transposition  corrects, for example,
23300	"beleive" to "believe".
23400	
23500	                      SEGMENTING
23600	
23700		Another  weakness in the crude pattern matching of PARRY1 was
23800	that it took the entire input  expression  as  its  basic  processing
23900	unit.  Hence  if  only  two  words  were  recognized in an eight word
24000	utterance, the risk of misunderstanding was great. We needed a way of
24100	dealing with units shorter than the entire input expression.
24200		Expert  telegraphists  stay  six  to  twelve  words  behind a
24300	received message before transcribing it.  Translators wait until they
24400	have  heard  4-6  words  before  they  begin translating.  Aided by a
24500	heuristic from work in machine-translation (Wilks, 1973 ), we devised
24600	a  way  of  bracketing  the pattern constructed up to this point into
24700	shorter segments using prepositions, wh-forms, certain verbs, etc. as
24800	bracketing points.  (A list of the bracketing terms appears  in  Fig.
24900	3).   These  tend  to  separate  prepositional  phrases  and embedded
25000	clauses from the main clause.   The  new  pattern  formed  is  termed
25100	either "simple", having no delimiters within it, or "compound", i.e.,
25200	being made up of two or more simple patterns. A simple pattern  might
25300	be:
25400		( WHAT BE YOU JOB )
25500	whereas a compound pattern would be:
25600		(( WHY BE YOU ) ( IN HOSPITAL ))
25700	Our experience with this method of segmentation shows  that  compound
25800	patterns  from teletyped psychiatric dialogues rarely consist of more
25900	than three or four segments. 
26000		After certain verbs (See  Fig.  4)  a  bracketing  occurs  to
26100	replace the commonly omitted "THAT", such that:
26200		( I THINK YOU BE AFRAID )
26300	becomes
26400		(( I THINK ) ( YOU BE AFRAID ))
26500	
26600			   MATCHING INDIVIDUAL SEGMENTS
26700	
26800		Conjunctions serve only as markers for the segmenter and they
26900	are dropped out after segmentation.
27000		Negations are  handled  by  extracting  the  "NOT"  from  the
27100	segment  and  assigning  a value to a global variable which indicates
27200	that the expression is negative in form. When a  pattern  is  finally
27300	matched,  this variable is consulted. Some patterns have a pointer to
27400	a pattern  of  opposite  meaning  if  a  "NOT"  could  reverse  their
27500	meanings.      If this pointer is present and a "NOT" was found, then
27600	the pattern matched is replaced by its opposite, e.g.  ( I not  trust
27700	you  ) is replaced by the pattern ( I mistrust you ). We have not yet
27800	observed the troublesome  case  of  "he  gave  me  not  one  but  two
27900	messages". (There is no need to scratch where it doesn't itch).
28000		Substitutions are also made in certain cases.  Some  segments
28100	contain  pronouns  which could stand for a number of different things
28200	of importance to PARRY2.       As we mentioned in  the  introduction,
28300	there  is  one  case in which the response functions of memory aid in
28400	language recognition.   One function of memory is to  keep  track  of
28500	the  context  in order to give pronouns and other anaphoras a correct
28600	interpretation.  For example, the segment:
28700		( DO YOU AVOID THEM )
28800	could refer to the Mafia, or racetracks, or other patients, depending
28900	on the context.  When such a segment is encountered,  the  pronoun  is
29000	replaced by its current anaphoric value as determined by the response
29100	functions, and a more specific segment such as:
29200		( DO YOU AVOID MAFIA )
29300	is  looked  up.
29400		There are other utterances, such as "Why did you do that?" or
29500	just  "Why?"  (which  might be regarded as a massive ellipsis), which
29600	clearly refer back to previous  utterances.  These  utterances  match
29700	very  general  patterns  which  identify the type of question without
29800	indicating the exact topic. The response function which  responds  to
29900	"Why?" consults the context to produce an appropriate answer.
30000		The algorithm now attempts to match the segments with  stored
30100	simple  patterns  which currently number about 1700. First a complete
30200	and perfect match is sought.   When a  match  is  found,  the  stored
30300	pattern  name  has  a  pointer  to the name of a response function in
30400	memory which decides what to do further.  If a match  is  not  found,
30500	further  transformations of the segment are carried out and a "fuzzy"
30600	match is tried.
30700		For  fuzzy  matching  at this stage, we adopted the heuristic
30800	rule of dropping elements in the segment one at a time and attempting
30900	a match each time.   This heuristic allows ignoring familiar words in
31000	unfamiliar contexts.   For example, "well" is important in  "Are  you
31100	well?" but meaningless in "Well are you?".
31200	 	Deleting one element at a time results  in,  for example, the
31300	pattern:
31400			( WHAT BE YOU MAIN PROBLEM )
31500	becoming successively:
31600			(a) ( BE YOU MAIN PROBLEM )
31700			(b) ( WHAT YOU MAIN PROBLEM )
31800			(c) ( WHAT BE MAIN PROBLEM )
31900			(d) ( WHAT BE YOU PROBLEM )
32000			(e) ( WHAT BE YOU MAIN )
32100	Since the stored pattern in this case matches (d), (e) would  not  be
32200	constructed.   We  found  it  unwise  to delete more than one element
32300	since our segmentation method usually yields  segments  containing  a
32400	small (1-4) number of words.
32500		Dropping  an  element  at  a  time  provides  a   probability
32600	threshold  for  fuzzy   matching which is a function of the length of
32700	the segment. If a segment consists of five elements, four of the five
32800	must be present in a particular order (with the fifth element missing
32900	in any position) for a match to occur. If  a  segment  contains  four
33000	elements, three must match - and so forth.
33100	
33200			   COMPOUND-PATTERN MATCH
33300	
33400		When more than one simple pattern is detected in the input, a
33500	second  matching  is  attempted  against about 500 compound patterns.
33600	Certain patterns, such as ( HELLO ) and (  I  THINK  ),  are  dropped
33700	because  they  are considered meaningless. If a complete match is not
33800	found, then simple patterns are dropped, one  at  a  time,  from  the
33900	complex pattern. This allows the input,
34000		(( HOW DO YOU COME ) ( TO BE ) ( IN HOSPITAL ))
34100	to  match  the  stored  pattern,
34200		(( HOW  DO  YOU COME ) ( IN HOSPITAL )).
34300	
34400		If  no  match  can  be found at this point, the algorithm has
34500	arrived at a default condition and the appropriate response functions
34600	decide  what  to  do.  For example, in a default condition, the model
34700	may assume  control  of  the  interview,  asking  the  interviewer  a
34800	question, continuing with the topic under discussion or introducing a
34900	new topic.
35000		An annotated example of a diagnostic psychiatric interview is
35100	presented in the appendix.
35200	
35300	
35400			   ADVANTAGES AND LIMITATIONS
35500	
35600		As   mentioned,   one   of   the   main   advantages   of   a
35700	pattern-matching strategy is that it can ignore  as  irrelevant  both
35800	some  of  what  it  recognizes and what it does not recognize at all.
35900	There are several million words in English, each  possessing  one  to
36000	over   a   hundred  senses.    To  construct  a  machine-usable  word
36100	dictionary of this magnitude is out of the  question  at  this  time.
36200	Recognition of natural language input such as described above, allows
36300	real-time  interaction  in  a  dialogue  since  it  avoids   becoming
36400	ensnarled   in  combinatorial  disambiguations  and  long  chains  of
36500	inferencing  which  would  slow  a   dialogue   algorithm   down   to
36600	impracticality,  if it could even function at all. The price paid for
36700	pattern-matching is that  sometimes,  but  rarely,  ambiguities  slip
36800	through.   Eventually a rewrite interpreter must be added to pattern-
36900	matching rules to deal with ambiguities. (Enea and Colby,1973).
37000		A drawback to PARRY1 was that it reacted to the first pattern
37100	it  found  in the input rather than characterizing the input as fully
37200	as possible and then deciding what to do based on a number of  tests.
37300	Another   practical   difficulty  with  PARRY1  from  a  programmer's
37400	viewpoint, was that elements of  the  patterns  were  strung  out  in
37500	various  procedures  throughout  the  algorithm.    It  was  often  a
37600	considerable chore for the programmer to determine  whether  a  given
37700	pattern   was   present   and   precisely   where   it  was.  In  the
37800	above-described method, the patterns are all collected in one part of
37900	the data-base where they can easily be examined.
38000		Concentrating all the patterns in the data base gives  PARRY2
38100	a  limited  "learning"  ability.   When  an  input fails to match any
38200	stored pattern or matches an incorrect one,  as  judged  by  a  human
38300	operator,  a  pattern  which  matches  the  input can be put into the
38400	data-base automatically.  If the new pattern has the same meaning  as
38500	a previously stored pattern, the human operator must provide the name
38600	of the appropriate response function.  If  he  doesn't  remember  the
38700	name,  he  may  try  to  rephrase the input in a form recognizable to
38800	PARRY2 and it will name the response  function  associated  with  the
38900	rephrasing.  These mechanisms are not "learning" in the commonly used
39000	sense but they do allow a  person  to  transfer  his  knowledge  into
39100	PARRY2's data-base with very little  effort.
39200		Informal  observation  thus  far  shows  PARRY2's  linguistic
39300	recognition  abilities  to  be  quite  superior  to  PARRY1's. A more
39400	systematic and quantitative evaluation of performance will be carried
39500	out in the near future.
39600	
39700	
39800	                      REFERENCES
39900	
40000	Colby, K.M., Weber, S., and Hilf, F.D. (1971). Artificial Paranoia.
40100		ARTIFICIAL INTELLIGENCE, 2, 1-25.
40200	
40300	Enea, H. and Colby, K.M. (1973). Idiolectic Language-analysis
40400	       For Understanding Doctor-patient Dialogues. THIRD
40500	       INTERNATIONAL JOINT CONFERENCE IN ARTIFICIAL INTELLIGENCE.
40600	       278-284.
40700	
40800	Wilks, Y. (1973). An artificial intelligence approach to machine
40900		translation. In COMPUTER MODELS OF THOUGHT AND 
41000		LANGUAGE, Schank, R. and Colby, K.M., Eds., W.H. Freeman,
41100		San Francisco.